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1. Introduction 



In recent years, information geometry emerged as a new field in mathe- 
matics, where geometric ideas and methods are exploited as principal tools 
to study mathematical statistics and related problems in information the- 
ory, neural networks, system theory [4]. Information geometry has also been 
identified as a natural formalism for complexity theory [5l|6]. In particular, 
complex networks can be analyzed with tools from information geometry 



Information geometry began as the investigation of natural differential- 
geometric structures on statistical models. Rao first suggested the idea of 
considering the Fisher information as a Riemannian metric on statistical 
models [29]. A systematical study of geometric structures on families of 
probability distributions was initiated by Chentsov [12], |14j as first steps 
in his program of geometrization of mathematical statistics [13] , [26] . As a 
result, based on an idea by Morozova, Chentsov found a family of natural 
torsion- free connections on statistical models jl4j . This family of torsion- 
free connections has been discovered independently by Amari [1], [2]. In the 
presence of the Fisher metric and the associated torsion free metric connec- 
tion, this family of torsion free connections is defined by a 3-symmetric ten- 
sor, called the Amari- Chentsov tensor, on the underlying statistical model 
(M, n,fi,p) (Definition [131 (H^]), ([23])). Here, M is a manifold which serves 
as a parameter space for a set of finite measures on il. More precisely, p 
assigns to each element x of the manifold a finite measure p{x) that is contin- 
uous with respect to the reference measure fi. This explicit parametrization 
allows us to define natural geometric structures on M that are inherited 
from the set of non- negative finite measures dominated by fi. The 

Fisher metric, or more generally, the Fisher quadratic form, denoted by g, 
and the Amari-Chentsov tensor, denoted by T, represent examples of such 
structures on statistical models which play a fundamental role in information 
geometry. Therefore, {M,g,T) has been considered as the basic mathemat- 
ical object within information geometry, also referred to as statistical model 
or statistical manifold \22\ I23j. However, in most generality this object is 
decoupled from the measure- and information-theoretic context. Including 
the embedding p as part of our notion of a statistical model allows us to 
treat the elements of M as measures on il. and thereby use concepts from 
measure theory, information theory, and statistics. One of the most impor- 
tant properties of the Fisher quadratic form and the Amari-Chentsov tensor 
is the invariance of these structures under statistics k : ^ that are 
sufficient for the parameter x € {M,0,i, fj,,p) (Definition 14.11 Theorem 14. Sh . 
In other words, the Fisher quadratic form and the Amari-Chentsov tensor 
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on (M, and (M, O2, coincide, if k is sufficient. Suffi- 

cient statistics represent important transformations between parametrized 
measure models, since they preserve the information of the underlying mod- 
els. Thus one wishes to know whether there are other quadratic forms and 
3-symmetric tensors on parametrized measure models which are invariant 
under sufficient statistics. This question has been solved by Chentsov in 
the negative for statistical models associated with finite sample spaces |14j . 
see also our discussion in Section 4 (Proposition I4.17|) . However, one natu- 
rally wishes to consider infinite sample spaces 0, and in this case the space 
of measures becomes infinite dimensional, and the topological aspects then 
become more subtle. More precisely, the main difficulty for an extension 
of the Chentsov theorem to all parametrized measure models is caused by 
two facts. Firstly, a statistical model associated with a finite sample space 
can be regarded locally as a submanifold in a universal statistical model 
(^^(rJn, which is a finite-dimensional open simplex (Example 
12. 5p . In this case, it suffices to consider the Fisher metric, the Amari- 
Chentsov tensor and other tensor fields on this open simplex. Secondly, 
the structure of sufficient statistics associated with the considered statisti- 
cal models can be described in terms of Markov congruent embeddings |14j . 
see also our discussion at the end of Section 4. It is not easy to generalize 
these facts to statistical models associated with infinite sample space, since, 
in particular, there is no canonical smooth structure on the set 7V4+(i7,//) 
of all measures equivalent to /i, or on the set M.{Vi,ii) of all measures dom- 
inated by ^. 

Note that the set A^(i7,/i) can naturally be considered as a subset of 
the Banach space L^{Q,,fi) (notations and more details are in §2), thus we 
could define the notion of a differentiable map from a smooth manifold 
M into L^(r2,/i). But the smooth structure on L^(r2,/i) is too weak to 
accommodate naturally the Fisher metric and the Amari-Chentsov tensor 
on submanifolds in L^{Q,, /i). In [28] and subsequent papers, Pistone and his 
coworkers constructed a new infinite dimensional Banach manifold structure 
on the set A4^{Q, fi) to set up a theory of non-parametric statistical models, 
which also accommodate naturally the Fisher metric and connections |10j . 
|16j . Though their theory includes all statistical models associated with a 
finite sample space, one can construct finite-dimensional statistical models 
associated with an infinite sample space that are excluded by their theory (cf 
Example I3.1ip . One of the technical difficulties in their theory is caused by 
the fact that the topology on the considered Banach manifolds is so strong 
that the space of bounded random variables is not dense in that topology 
[lOl Lemma 2]. 

In this paper, we choose another approach. In order to define a tensor 
field on parametrized measure models, we formally assume a global covari- 
ant n-tensor field r on A4{Q) without specifying any regularity properties. 
Instead, given a parametrized measure model (M, we consider the 

corresponding tensor field r* on the Banach manifold M, which stands for 
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the pull-back of r with respect to p : M ^ 7V4(f],/i), and require the reg- 
ularity conditions directly for r*. This way, we have for each parametrized 
measure model a tensor field r*, implicitly inherited from the field r, which 
we assume to satisfy particular conditions. This general assignment consti- 
tutes our notion of a statistical tensor field (Definition I2.7p . 

The structure of our paper is as follows. In Section 2 we introduce the 
notion of a fc-integrable parametrized measure model, which encompasses 
all known examples in statistics. We compare our concept with the concept 
of a geometrically regular statistical model proposed by Amari. At the end 
of this section, we state our Main Theorem 12.91 In Section 3, we study the 
relations between fc-integrable parametrized measure models and statistical 
models in the Pistone-Sempi theory. In Section 4 we introduce the notion of 
sufficient statistics based on the Fisher-Neyman characterization (Definition 
14. H Lemma l4.3p . We give a simple proof that the Amari-Chentsov struc- 
ture is invariant under sufficient statistics (Theorem 14. 5p . At the end of the 
section we discuss Chentsov's results on the uniqueness of the Fisher metric 
and the Amari-Chentsov tensor (Proposition 14. 171 Lemma [4.18p . In Section 
5 we introduce the notion of a Markov morphism. A novel aspect of our 
concept of Markov morphisms between parametrized measure models is the 
consideration of smooth maps between the parameter spaces (Definition [531 
Example 15. 5p . Thus, the geometry of parametrized measure models is intrin- 
sic. We decompose a Markov morphism as a composition of a right inverse 
of a sufficient statistic and a statistic (Theorem I5.10p . As a consequence we 
give a geometric proof of the monotonicity theory for Markov morphisms 
(Corollary 15. lip . In Section 6 we give a proof of our Main Theorem. 



In this section we describe the geometry of spaces of measures and of 
parametrized families of measures. In technical terms, we introduce the 
notion of a fc-integrable parametrized measure model (Definition 12. 3p and 
the notion of tensor fields on them, following the locality and continuity 
condition (Definition 12.11 Remark 12. 4p . We show that our notion of gen- 
eralized statistical models encompasses all statistical models considered by 
Chentsov, Amari, Pistone-Sempi (Remark 12.41 Example 12. 5p . and we com- 
pare our concept with that by Amari (Remark 12. 6p . 

Let (O, S) be a measurable space. Later on, will also have to carry a 
differ entiable structure. 

We consider the Banach space of all signed finite measures on with the 
total variation || • \\rpy as Banach norm. More precisely, the total variation 
of such a measure fi is defined as 



2. Parametrized measure models 
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where the supremum is taken over dll finite partitions 17 — ^^^U . . . U^y^ 
with disjoint sets Ai € S. We consider the subset of all finite non- 

negative measures on 17, and, with a cj-finite non- negative measure jio, we 
also consider the subspace 

of signed measures dominated by ^q. This space can be identified in terms 
of the canonical map ican '■ 5(0, //q) — >■ L^{0,,^q), ji ^ 4j^. Note that 



I 1 1 TV 



duo ' 



which implies that ican is a Banach space isomorphism. Therefore, we refer 
to the topology of S{Q,fio) also as -topology. This is independent of the 
particular choice of the reference measure /xQ) because if G L^{^,fio) and 
^p £ L^{il., (pfio), then ipcp G L^(r2,/io)- Throughout the paper, we consider 
the following hierarchy of subsets of 5(0, /iq): 

M{n,fio) = {fi = <Pfio : 0eLi(O,^io), 0>O} 
M+{n,fio) = {fi = (f)fiQ : (f) £ L\n,i^io), (t>>0} 
M''{n,fio) = = (/'^o : e L^(0,/io), (I) > 0, = WnW^y = a} 

v{n,no) = {fi£M{n,fio) : fi{n) = \\fi\\^y = i} 
v+{n,no) = {fieM+{n,no) : nin) = \\ti\\Tv = '^} 

In particular, for fi = (j)fiQ € Al+(0, ^0)5 i-6-, (j) > 0, Ho and fi have the same 
null sets and are equivalent, that is, /io = S Al+(0,/x). Thus, we have 

some kind of multiplicative structure on ^A^{Q, no), and one might hope to 
generate this via an exponential map from the linear structure on L^(0, /io)- 
The problem, however, is that if / G L^{Q,fio), then we do not necessarily 
have e'^ G L-^(0,^o)- When it is, then e-^/io G M.+{fl, fio), but when it is 
not, the measure ^o is not well defined. Thus, certain infinitesimal de- 
formations are obstructed, that is, cannot be integrate into local ones. Of 
course, this does not happen when is finite, the case treated by Chentsov, 
and this is the technical reason why we need to work harder for our main 
result. (Pistone and Sempi have analyzed the underlying topological struc- 
ture, and we shall describe their construction from our perspective in Section 
m The essential point for an intuitive understanding of this topology is that 
if € L\ then for < t < 1, e*^ G for p = 1/t > 1. ) 

In order to avoid this issue and in order to make contact with the basic 
construction of parametric statistics, we shall consider parametrized families 
of measures, that is, differentiable maps M — > A1(0) of smooth Banach 
manifolds M into the "universal measure set" A^(0) and attempt to pull 
geometric structures from A1(0) back to M by such maps. Since, however, 
we may not be able to fully define these objects on 7W(0), we shall have 
to push forward tensors from M instead, and integrate them w.r.t. the 
measures p{x) defined by a parametrized family. (In fact, the smoothness 
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requirement for M that we really need is continuous differentiability, that is, 
a C^-structure, but since in statistics, the precise smoothness requirement 
is usually not an important issue, we do not elaborate upon this point.) 
We shall now introduce the technical conditions needed to realize universal 
objects on M(Vt) on such parametrized families. 

If we have a differentiable map p : M A4{0,) that assigns to each 
X € M a measure p{x,.) G 7V4+(il,^o)i then we can push forward vector 
fields (or other contravariant tensor fields) on M, and we can pull back 
covariant tensors from A4+{il, fio). We need to impose various conditions 
on the tensors and on the maps which we are now going to develop. 

Definition 2.1. A covariant n-tensor field on A4.{Vt) assigns to each /i G 
A4.{VL) a multilinear map : ^"L"(J7,/i) — t- M that is continuous w.r.t. 
the product topology on 0"L"(r2,/x). 

In this definition, continuity refers to the continuity of the linear maps 
for fixed /i. (This is different from requiring that be continuous as a 
function of /U.) 

Such objects then will be pulled back to M under a map p : M ^ Jli{Q), 
and they then operate on n vector fields on M. When these vector fields 
are continuous, their evaluation under the pulled back covariant tensor field 
should also be continuous. This requirement is formalized in 

Definition 2.2. A covariant n-tensor field r on a smooth Banach manifold 
M is called (weakly) continuous if for any continuous contravariant n-tensor 
field ^ on M the function t{A) is a continuous function on M. 

In contrast to the preceding definition which was only concerned with the 
continuity of a linear operator at each point, this definition requires that the 
objects be continuous as functions of the point x £ M. 

For a map p : M ^ M.{^1, fi) the composition p := poican '■ M — )• L^{Q, fi), 
X 1-^ p{x) := -^7^, will play a central role. Thus, p is a map from M to 
L^{il^n) which we can then evaluate at some a; € ^2. We can therefore 
consider p also as a map x ^ M, (x, oj) ^ p{uj, x) = ^^^^(uj), which we 
refer to as the density potential. However, this notation is slightly misleading, 
and the infinitesimal tangent vector of the family rather corresponds to 
lnp(c<;, x) (recall our discussion above of the exponentiation of / G L^{Q, ji), 
and taking the logarithm of course is the inverse of exponentiation.) In 
particular, the pushforward of a tangent vector V G T^M is 5y lnp(x, w), 
and we often simply identify V with its pushforward when the map p is fixed 
in a given context. 

Our parametrized families of measures will need to satisfy some further 
important technical requirements that we shall now list and that will lead 
us to our technical concept of a parametrized measure model. 

(1) The parameter space M is a smooth manifold (of class at least C^, 
to be precise). 
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(2) There is a continuous mapping p : M — )> provided with 
the L^-topology. 

(3) The composition p = i^an °P is difFerentiable as a map from the man- 
ifold M to the Banach space L^{Q,fi) (Gateaux-differentiabihty). 

(4) The 1-form 

(2.1) A{V),:=[dvlnpix,u;)dp{x), 
the Fisher quadratic form 

(2.2) g^{y,W)x-= / dvlnp{x,io)dw^''^p{x,oo) dp{x) 

Jn 

and the Amari-Chentsov 3-symmetric tensor 

(2.3) T'^'^{V,W,X)x := / dvlnp{x,oo)dwlnp{x,uj)dxlnp{x,uj) dp{x) 

Jn 

are well-defined and continuous in the sense of Definition 12. 2i 
We can now state our general definition of a parametrized measure model. 

Definition 2.3. (cf. [3 §2 , p. 25], 01 §2.1]) Let A: > 1. A k-integrable 
parametrized measure model is a quadruple (M, Q, ^,p) consisting of a smooth 
(finite dimensional or infinite dimensional) Banach manifold M and a con- 
tinuous map p : M ^ fi) provided with the L-'^-topology such that 

(1) the function x i-^ lnp{x,oj) := ln^^^j^(a;) : M ^ M is defined and 
continuously Gateaux-differentiable for //-almost all a; S 

(2) for all 1 < p < k and for all continuous vector fields V on M the 
function uj dvlnp{x,uj) belongs to U'{Q,p{x)) ; moreover, the 
function x \\dv l'n.p{x,u})\\];^p(^Q^p(^x)) is continuous on M. 

We call M the parameter space of (M, i}, fJ^jp)- We call (M, Q, fi,p) a statis- 
tical model if p{M) C ^^(f], /.t). A A;-integrable parametrized measure model 
(M, /i,p) is called immersed if dx In p : TxM — t- L''(f],p(2;)) is injective for 
ah X G M. 

Remark 2.4. 1. Note that, as explained above, the choice of a reference 
measure in A4j^{Vl, /t) is immaterial for a A:-integrable parametrized measure 
model (M, ^},fi,p). 

2. For a statistical model, (j2.ip vanishes identically. Recalling the iden- 
tification of the tangent vector V on M with its pushforward dy l^P, this 
simply means 

(2.4) [ Vdn = 0. 

Jn 

To obtain (|2.4p we argue as follows. For a curve x(t), t € (— e, e), on M with 
9f := i;(t) = V{{x(t)) the condition (2) in Definition 12.31 implies that 

f{t) := [ dtlnp{x{t),io)dp{x{t)) 
Jd. 
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is continuous and hence integrable over {—e,e). In particular, A{V)x is con- 
tinuous in X. Apply the Fubini theorem and the condition (1) in Definition 
we have 

{dtlnp{x{t),u)))p{x{t))dfidt = / / {dtlnp{t,u!))p{t,u!)dtdfi 



{p{e,uj) -p{-e,uj))dii = 0. 

Q 

Observe that the above formula for general /c-integrable parametrized 
measure models implies 

(2.5) dv / lnp{x,u})dp{x) = / dvlTn.p{x,u!)dp{x) 

Jn Jn 

for all X E M and for all tangent vector V € T^M. 

3. For any fc-integrable parametrized measure model (M, $7, fi,p) the com- 
position ican°P '■ M — > L^{Q, fj,) is Gateaux-differentiable by (j2.5|) and taking 
into account 

lOye^'^P^^V/^ = / \p{x)dv In p{x)\dfi= [ \dv In p{x)\dp{x) < oo . 
In Jn Jn 

4. Any 3-integrable parametrized measure model carries the Fisher qua- 
dratic form and the Amari-Chentsov tensor, which are continuous in the 
sense of Definition 12.21 On a fc-integrable parametrized measure model 
(M, fi,p) the covariant symmetric n-tensor field T"(y, • • • ,V) := (dy lnp{x, u) 
satisfies the locality and continuity conditions required in the introduction. 

5. In [14] Chentsov considered only statistical models (M, where 
M is a submanifold in fXn) and p is the canonical embedding, see 
also Example 12.51 Amari and all authors before Pistone and Sempi consid- 
ered only statistical models (M, r2,/x,p) where M is finite dimensional and 
p{M) C [3]. Their examples satisfy the conditions in Definition 

ESI 

Example 2.5. 1. Let 0.^ be a finite set of n elements and a measure 
of maximal support on It is evident that is diffeomorphic 

to M". Let S be a C"^-submanifold in V+{^n, fJ-n) and : 5 — )• /x^) 
the canonical embedding. Then (5, i^) is an immersed /c-integrable 

statistical model for all k > 1. In particular, {V^{Qn, IJ'n),^n, fJ^n, Id) is a 
/c-integrable statistical model. Conversely, for any immersed 1-integrable 
statistical model {M,Qn, IJ^mP) the map p : M ^ V+{0,n, fJ^n) defines an 
immersion M ^^(rin, between differentiable manifolds. 

2. If s : — 7- M is a smooth map and {M,Q, fi,p) is a fc-integrable 
parametrized measure model, then (N, $7, fj,,pos) is a fc-integrable parametrized 
measure model. 

On a 3-integrable parametrized measure model {M,0,, fj,,p) the pair of 
the Fisher quadratic form and the Amari-Chentsov tensor is also called the 
Amari-Chentsov structure. 
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Remark 2.6. We would like to compare our concept of a /c-integrable 
parametrized measure model with the concept of a geometrical regular sta- 
tistical model proposed by Amari, for instance in [U §2 ]. Amari listed 6 
properties a geometrically regular statistical model {p{x,uj)} must satisfy 
[21 Ai-Ag, p. 25]. The condition Ai says that the domain of parameter x 
is homeomorphic to M". The conditions A2 and A3 are equivalent to our 
condition (2) listed just before Definition 12.31 The condition A4 requires 
that p{x,u}) is smooth in x uniformly in co, and moreover the relation (j2.5p 
holds. The condition A5 requires that the statistical model is 3-integrable, 
moreover the function x i->- ||(9y lnp(3;,a;)||x,p(Q p(3,)) exists and smooth for 
1 < P < 3. The last condition Ag requires that the Fisher quadratic form is 
positive definite. Amari's conditions are slightly stronger than ours, but in 
general our concept agrees with his concept. 

As mentioned above, we consider tensor fields on parametrized measure 
models {M,Q, fi,p) that are inherited from a corresponding field on the 
"universal measure set" in terms of the parametrization p. 

Note that we do not impose any strong regularity conditions on tensor 
fields on A4{Q). Instead, we assume the required regularity and continu- 
ity conditions to be satisfied on the pull-back of the field with respect to a 
parametrization p : M — ?> A4{Q). In addition to these conditions, the exis- 
tence of a global tensor on A4{Q) sets some compatibility constraints on the 
associated fields on the class of parametrized measure models {M,^l, fi,p). 
In the following definition we summarize necessary regularity and compat- 
ibility conditions for tensor fields, which are, in particular, satisfied in the 
case of the Fisher quadratic form and the Amari-Chentsov tensor. 

Definition 2.7 (Locality and continuity condition). A statistical covariant 
(continuous) n-tensor field A assigns to each parametrized measure model 
{M,Q, fi,p) a continuous (in the sense of Definition \2.^} covariant n-tensor 
field ^|(M,n,^,p) ^ (cf- Definition 12. ip . A statistical covariant n-tensor 
field A is called local if for any parametrized measure model (M, Q, fi,p) and 
any Vi £ T^M the value ^|{M,n,^t,p)(^i) • • • depends only on p{x) and 
the values dy^ Inp(x), . . . ,dy^ Inp(x) G L"'{VL,p{x)) (but not on p{M)). 

In particular, this means that the value depends only on p(x), but not on 
the manifold M defining the parametrized family of which p{x) is a member. 

Remark 2.8. 1. Assume that ^ is a local statistical covariant continuous 
n-tensor field. Using Lemma [6. 21 and condition (1) in Definition 12.31 we note 
that A defines a point- wise continuous n-tensor field A on 7W(0) (Definition 
IZTD by setting 

(2.6) ip(^)(ai/ilnp(x),--- ,dv^\^p{x)) = A|(M,n,/io,p)(^i(a^)'--- >K(a;)). 

Thus, in order to define A it suffices to determine the associated point- 
wise continuous n-tensor field A on M.{VL) and then verify if the original 
statistical field A is continuous. 
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Condition ()2.6p holds for the Fisher quadratic form field and the Amari- 
Chentsov tensor field. The choice of dv \ap{x) is also related to the Gateaux- 
differentiability of p (Remark I2.4[ 3). We choose L^{VL,p{x)) as a natural 
condition for the value dv„ \'ap{x) since it is a natural extension of the con- 
dition for the existence of the Fisher quadratic form and the Amari-Chentsov 
tensor on a parametrized measure model. In fact, it is also possible to re- 
place the value space L^{Q,p{x)) by another function space depending on 
the measure p(x) (Remark 16. 3p . 

2. The locality and continuity condition holds obviously for tensor fields 
on statistical models associated with finite sample spaces as in the Chentsov 
work [13]. 

3. In [22] and [23], Le proved the following variant of the locality condi- 
tion, which has been asked by Lauritzen |21] and Amari-Nagaoka [3]. For any 
statistical model (M, g, T) there exist a finite sample space VL^ provided with 
a dominant measure /x^ and an immersion p : M ^ M.{i^n, l^-n) = ■M.{Qn) 
such that the statistical structure {g, T) is induced from the Amari-Chentsov 
structure on {Ai{^ln, fJ-n),^n, fJ-, Id) via p. 

Our main theorem uses the notion of a sufficient statistic and the as- 
sociated invariance property. As already stated in the introduction, suffi- 
cient statistics are important transformations between parametrized mea- 
sure models, since they preserve the information of the underlying models. 
Although we introduce the corresponding definitions later in the paper, we 
present our main theorem already here so that its main structure guides the 
arguments and motivates further results of the paper. 

Theorem 2.9 (Main Theorem). (1) Assume that A is a local statistical 
continuous 1-form field. If A is invariant under sufficient statistics then 
there is a continuous function c : M ^ M such that for all finite measures fi 
on Q, and for all V G L^{fl,fi) we have 

In particular, recalling ^2.4\ ), there is no weakly continuous 1-form field on 
statistical models that is invariant under sufficient statistics. On a parametrized 
measure model {M,Q, fi,p) the field A is expressed as follows 

(2.7) A{V),, = c{[ dp{x))-dv{f dp{x)). 

JQ JQ 

(2) Assume that F is a local statistical continuous quadratic form field. If 
F is invariant under sufficient statistics then there is a continuous function 
/ : M — )• M such that F{x) = f {J^p{x))g^ (x) + A{x)'^ , where A is the field in 
(1) and g^ is the Fisher quadratic form. In particular, the Fisher quadratic 
form is the unique up to a constant weakly continuous quadratic form field 
on statistical models that is invariant under sufficient statistics. 

(3) Assume that T is a local statistical continuous covariant symmetric 
3-tensor field. If T is invariant under sufficient statistics then there is a 
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continuous function t : M — ?> M such thatT{x) = t{f^p{x))T'^'^ {x)+Ai{x)^+ 
A2{x) ■ g^{x), where Ai,A2 are the fields described in (1), and T^^ are 
the Fisher quadratic form and the Amari-Chentsov tensor respectively. In 
particular, the Amari-Chentsov tensor is the unique up to a constant weakly 
continuous 3-symmetric tensors field on statistical models that is invariant 
under sufficient statistics. 

Campbell noticed that the Fisher metric on parametrized measure models 
associated with a finite sample space 0„ coincides with the Shahshahani 
metric [9], which is important in mathematical biology and game theory 
|30j . It is interesting to find applications in this direction of the Fisher 
metric and other natural metrics on generalized statistical models described 
in the Main Theorem. 

3. The Pistone-Sempi structure 

In this section we study the relations between /c-integrable parametrized 
measure models and statistical models in the Pistone-Sempi theory. First, 
we show that the Pistone-Sempi manifold is a A;-integrable parametrized 
measure model for any k (Proposition 13.10]) . We also construct an exam- 
ple of a fc-integrable parametrized measure model which does not admit a 
continuous map into the space M^{fl, hq) with the topology of Pistone and 
Sempi (Example I3.1ip . 

In SectionlH we considered the L^-topology of A4+(r2, /io). However, this 
set carries also a stronger natural topology, discovered by Pistone and Sempi, 
which is referred to as the exponential topology (also e-topology) \28\ §2.1]. 
In fact, Pistone and Sempi considered only the space P+(f2,//) but their 
theory works also for A4^{Q,fi) = ^+($7,//) x R+. Let us briefly recall the 
notion of the e-topology, which is defined using the notion of convergence of 
sequences. 

Definition 3.1. [28l Definition 1.1] The sequence (/U„)„gN in M+{i^,fi) 
is e-convergent (exponentially convergent) to if (/U„)„gN tends to /i in 
the L-'^-topology as n — )■ cx), and, moreover, the sequences {dfin/dfj.)n&N and 
{dfi/dfin)nm are eventually bounded in each L^(0, //), p > 1, that is, d^n/d^ 
and dn/dfin converge to 1 with respect to all p-seminorms U'{Q,iJ,), p > 1. 

While A4+(r2,/io) is connected with respect to the L^-topology, its set of 
connected components with respect to the e-topology is more interesting. 
In what follows we briefly describe these components and their structure. 
Although the stated facts are known from the work of Pistone an Sempi, 
our presentation is slightly different and illuminates more abstract aspects. 

3.1. Orlicz spaces. In this section, we briefly recall the theory of Orlicz 
spaces which is needed in section 13.21 for the description of the geometric 
structure on Most of the results can be found e.g. in |19j . 

A function (/> : M — )■ M is called a Young function if i;^(0) = 0, (/> is even, 
convex, strictly increasing on [0, oo) and limt_j.oo t~^4'{'t) = oo. Given a finite 
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measure space (O, /x) and a Young function c^, we define the Orlicz space 

L*(/i) := |/ : Jl ^ M I j ^-^ dfi< oo for some a > o| , 

and on L'^{fJ,) we define the Orlicz norm 

||/|U,^:=inf|a>Ol^(/>(^0d/i<l 

For any Young function, {L'^{fi), \ \ ■ ||<^,^) is a Banach space. Moreover, a 
sequence (/n)neN S -^'^(m) converges to if and only if 

lim / (j){pfn) dfi = for all p > 0. 

Proposition 3.2. Let (fl, be a finite measure space, and let (j)i,4>2 : K — > 
M be two Young functions. If 

lim sup < oo, 

t^oo (P2{t) 

then C L'^^{fi), and the inclusion is continuous, i.e., \\f\\(j,-^^^ < 

c ||/||<^2,M some c > and all f G L'^^{fi). In particular, if 

U < hmmf — -— - < lim sup — rr- < oo, 

t^oo (p2[t) t-^oo (P2{t) 

then = L^'^{ii), and the Orlicz norms \\ ■ and \\ ■ \\<f,2,fi are 

equivalent. 

Proof. By our hypothesis, (j)i{t) < K(j)2{t) for some K > 1 and all t > to- 
Let / S L'^^{fi) and a > \ \f\\(j,2,fi- Moreover, decompose 

Q := Q1LJO2 with ni:={ioen\\f{u})\>ato}. 

Then 

K > K j (P2 (^-^^ d^L > j K(t>2 (^-^^ dn 

f f\f\\ 1/1 
> (f)i I — df-t as — > tQ on ili 

Jui \ a J a 



n V « y Jn. V a 



> / <Ai ( — M/^ 



/n V 0^ / 
> j^ct>i(^-^^dfi-Mto)K^) 



f I/I 
/ (j)i{tQ)d^ as — < to on VL2 
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Thus, /^(/>i (^^) < K + (l)i{to)fJ,{n) =: c, hence / G L'^^ifi). Convexity 
and 0i(O) = hnphes that (pi{c~^t) < c~^(/)i{t), as c > 1 and hence, 



'1 I — 1 dfi < c ^ I (pi 
In \acj Jn V « 

so that ac > whenever a > H/H^j.^t) ^^^^ shows the claim. □ 

The fohowing lemma is a straightforward consequence of the definitions 
and we omit the proof. 

Lemma 3.3. Let {ft, fi) be a finite measure space, let (f) : —?■ R be a Young 
function, and let (j){t) := 4>{\t) for some constant A > 0. 

Then cj) is also a Young function. Moreover, = L'l'{^) and ||-||^^ = 

'^11 ■ \ \(f),ij,! -50 that these norms are equivalent. 

Furthermore, we investigate how the Orlicz spaces relate when changing 
the measure /x to an equivalent measure fj,' G Ai{^l,fi). 

Proposition 3.4. Let ^ fj,' £ A4{Q,fj,) be a measure such that djj,' /d^ G 
LP(r2, /x) for some p > 1, and let q > 1 be the dual index, i.e., p~^ + q~^ = 1. 
Then for any Young function (p we have 

L^^(/i)cL^/x'), 
and this embedding is continuous. 

Proof Let h := dfi'/dn G LP{Q,h) and c := \\h\\p > 0. If / G L<^'(/i) and 
0- > \\f\\(j>i,ij., then by Holder's inequality we have 



— ]dfi<l, 



< c 





= c 






1 





< c. 

1 



<i 



Thus, / G L'^(/u'), and a > ||/||c-i^x whenever a > ||/||<^9,;„ hence ||/||<^9,^ > 
Il/llc-i0,/x'- This shows the claim as || • ||c-i(/),/i' and || • \ \^^^' are equivalent 
norms on L't'ifi') by Proposition [321 □ 

3.2. Exponential tangent spaces. For an arbitrary fj, G A^+(r2,/Uo), we 
define the set 

B^{Q) ■={f [-oo,+oo] : G L^{n,n)}, 

which by Holder's inequality is a convex cone inside the space of measurable 
functions Q [— oo,+oo]. For /xq, there is a bijection 

and for G M+{n, iJ,o) we have log^/^ = log^^, -u where u := log^/^(/io). 
That is, log^y canonically identifies A4+{Q,, /io) with a convex set. Moreover, 
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we let 

B^in) := B^{n)n{-B^{n)) 

= {/ : ^ [-00, 00] j e^^ e L^{n, fi)} 
= {/ : ^ [-00, 00] I el^l € L^{n, fx)} 

and 

:= {/ G B^{n) I (1 + s)f G Bf,{n) for some s > 0}. 

The points of ^^(O) are called inner points of B^{Q). 
Note that for fj. G A^+(il,/io) we have B^{Q) C i?^o(il). 

Definition 3.5. Let n G 7W+(il,^o)- Then 

T^M+{n,fio) := {/ : ^ [-00,00] | t/ G -B^(J7) for some t / 0} 

is called the exponential tangent space o/A4+(0,^o) ^• 

Evidently, this space coincides with the Orlicz space r^A4+(ri, /io) = 
^cosht-i^^^ and hence has a Banach norm. Moreover, B^{^) C T^A4+(ri, //q) 
contains the unit ball w.r.t. the Orlicz norm and hence is a neighborhood 
of the origin. Furthermore, limt_s.oo t^/(cosht — 1) = for all p > 1, so that 
Proposition 13.21 implies that 

(3.1) L^{n,fio) c T^M+in,fio) c (~]LP{n,fi), 

P>1 

where all inclusions are continuous. 

Remark 3.6. In |16l Definition 6], T^Al+(ri, ;Uo) is called the Cramer class 
of fi. Moreover, in [161 Proposition 7] and [28l Definition 2.2], the subspace 
of centered Cramer class is defined as the functions u G T^M+^^l, fio) with 
J^u dfi = 0. Thus, the space of centered Cramer classes is a closed subspace 
of codimension one. 

In order to understand the topological structure of Al+(r2,/io) with re- 
spect to the e-topology, it is useful to introduce the following preorder on 

(3.2) /u' ^ /i if and only if fi' = (pfj, with cp G ViQ,, fj,) for some p > 1. 

In order to see that ^ is indeed a preorder, we have to show transitivity, as 
the reflexivity of ^ is obvious. Thus, let /x" ^ fi' and /x' ^ fi, so that fi' = (pfi 
and fi" = Tpfi' with G LP{Q, fi) and iJj G Lp' {Q, fi'), then (/^, G L\n, /x) 
for some p,p' > 1. Let A := {p' - l)/{p -1-^' - 1) G (0, 1). Then by Holder's 
inequality, we have: 

L\n,fll) 3 (V/0)^~^(0P)^ = ijP'(^~^)(l)^+^(P-^) = {tl;(l))P", 

where p" = pp' / (p + p' — 1) > 1, so that ^p(|) ^ -^^ (^j A*); and hence, fi" ^ /x 
as fj," = '4)(j)iJ,. 



INFORMATION GEOMETRY AND SUFFICIENT STATISTICS 



15 



From the preorder ^ we define the equivalence relation on ^o) by 

(3.3) /i' ~ /i if and only if /x' ^ /i and fi ^ fi' , 

in which case we call /i and /x' similar, and hence we obtain a partial ordering 
on the set of equivalence classes /io)/~ 

ifJ-'] ^ [lA if and only if /i' ^ /x. 

If /i' ^ /i, then T^^A-^-{^l, Ho) C r^/A4+(0,^o) is continuously embed- 
ded. Namely, limt_!.oo (cosh t — l)'^/(cosh(q't) — 1) = 2^~'^, and then we apply 
Propositions 13.2 1 and 13.4] as well as Lemma 13.31 

In particular, if ~ /i' then T^Ai+{0,, hq) = T^l^A-^-{^l, fio), and this 
space we denote by r[^]A^+(r2, /io)- This space is therefore equipped with a 
family of equivalent Banach norms, and we have continuous inclusions 

(3.4) T^^,]M+{n,fio)z^T^^^M+{n,fio) if [/"'] ^ M- 

The following now is a reformulation of Propositions 3.4 and 3.5 in |28j . 

Proposition 3.7. A sequence {gn)neN G ■M{0,,fj,Q) is e-convergent to g £ 
M{0,,fio) if and only if gnfJ'O ~ gfJ-o for large n, and Un := log|g„| G 
Tg^Al+(r2, /i) converges to uq := log\g\ £ Tg^Mj^{yt,, ^iq) in the Banach 
norm on Tg^Mj^{Q., ^q) described above. 

By virtue of this proposition, we shall refer to the topology on T^(r2,/io) 
obtained above as the topology of e-convergence or the e-topology. Our de- 
scription allows us to describe in a different way the Banach manifold struc- 
ture on ^A{Cl,f^o) defined in [28] . 

Theorem 3.8. Let K C A^4-(r2,/io) be an equivalence class w.r.t. ~, and 
let T := Tj^] A4+(il, /^o) for ^ £ K he the common exponential tangent space, 
equipped with the e-topology. Then for all ^ £ K, 

A^ := log^iK) C T 

is open convex. In particular, the identification log^ : Afj_ K allows 
us to canonically identify K with a open convex subset of the affine space 
associated to T. 

Remark 3.9. This theorem shows that the equivalence classes w.r.t. ~ are 
the connected components of the e-topology on and since each 

such component is canonically identified as a subset of an affine space whose 
underlying vector space is equipped with a family of equivalent Banach 
norms, it follows that A^(r2,/xo) is a Banach manifold. This is the affine 
Banach manifold structure on /xq) described in [28], therefore we refer 

to it as the Pistone-Sempi structure. 

Proof (Theorem\3^ 1^ f & A^, then, by definition, (1 + s)f, -sf G B^{Q) 
for some s > 0. In particular, sf G B^{i}), so that f £ T and hence, ^4^ C T. 
Moreover, if / G then A/ G A^ for A G [0, 1]. 
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Next, if g £ A^, then := fx G K. Therefore, / € A^' if and only if 
K 3 fj,' = e^~^^ fi if and only if / + (7 € A^, so that A^i = g + A^ for a fixed 
g & T. From this, the convexity of A follows. 

Therefore, in order to show that A^^ C T is open, it suffices to show 
that G Af^f is an inner point for all fi' G K. For this, observe that for 
/ G we have (1+s)/ G B^>{Q) and hence e±(i+^)^ G L^{n,fi'), so that 

e-^ G L^+'{n,n') and e'J^ G L^+'{n,fi') C L'{n,n'), whence e^^' r-. fi' ^ fi, 
so that e-^/i' G K and hence, / G ^1^/. Thus, G (O) C A^', and since 
B^,{Q) contains the unit ball of the Orlicz norm, the claim follows. □ 



In the terminology which we developped, we can formulate the significance 
of the Pistone-Sempi structure on fiQ) as follows. 

Proposition 3.10. The quadruple (A4+(r2, /i), 0, ican) "is a k-integrable 
statistical model for all k > 1 . 

Proof. Note that for x G we have lnp{x,oj) = lnx(a;). Using 

this and the definition of the Pistone-Sempi manifold, we conclude that the 
first condition in Definition 12.31 holds for the Pistone-Sempi manifold. The 
second condition in Definition l2.3l also holds for the Pistone-Sempi manifolds, 
since by (|3.ip the inclusion T^A4_|_(n, //) — )■ L^{Q.,ij) is continuous for all 
k>l. □ 



The following example shows that the notion of a fc-integrable parametrized 
measure model is more general than the corresponding notion within the 
theory of Pistone and Sempi. 

Example 3.11. Let := (0, 1), and consider the 1-parameter family of 
finite measures 

p{x) := p{x,t)dt := exp ^-^^ G M + {{0,l),dt), x G M. 

This family defines a {k — l)-integrable parametrized measure model: Con- 
sider the map 

\np{-,t) : X 1-^ -. 

tk 

It is continuously differentiable for all t G (0, 1) and therefore satisfies condi- 
tion (1) of Definition 12. 3[ Now we come to condition (2): With a continuous 
vector field 1/ : M — )> M, we have 

dv\np{x,t) = V{x)^lnp{x,t) = V{x)-^(-^) = -V{x)'^. 

ox dx \ tk J tk 
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We now show that the function t i-)- 5y lnp(x,t) belongs to -^^-'((0, l),p{x)) 
for all j < k — 1: 

I'-'Hx) := l|8l'lnp(a^,*)|li,,|„_,|^p,.„ 



Jo \ tk J \ tk J 



,1 

tk 



{2\xV{x)\y k 



< oo. 

Finally, we now have to show that the function x i— >■ I^^\x) is continuous. 
In order to verify the continuity in a point xq G M it is sufficient to consider 
the restriction of I^^^ to the closed interval [xq — e, xq + e] with some positive 
number e. On this interval, the corresponding integrand is upper bounded 
by a function that only depends on t and is integrable: 

2|xy(x)|\^' / x2\ c 
' 1 ^' exp -— < — , c>0. 

tk J \ tk J tk 

Therefore, by the continuity lemma for integrals, I^^'^ is continuous, which 
completes the proof that our family is (A; — l)-integrable parametrized mea- 
sure model. However, it does not define a model in the sense of Pistone and 
Sempi. In order to see this we show that for all x 7^ 0, p{x) and p(0) are not 
similar: Obviously, 

dt = exp ( — ) dp{x). 
\tk J 

The similarity of dp{x) and dt would imply that -j^^ is in L^''~*((0, 1), dp{x)) 
for some s > (see 13.21 and 13. 3p . However, for all s > 0, we have 

exp I — ) ) dp{x) = I exp I — — J dt 

>- hiir^ 

= 00. 

Thus, p{x) and dt are in different e-connected components of A4+((0, 1), dt) 
and, therefore, the map p cannot be continuous with respect to the e- 
topology. Hence, the parametrized measure model cannot be considered 
as a submanifold of A4+((0, l),dt) in the sense of Pistone and Sempi. 

We end this section with the following result which illustrates how the 
ordering -< provides a stratification of i?^g(J7). 
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Proposition 3.12. Let fiQ, fi'i G A^+($7,^o) with fi := log^g(/U^) e B^q{Q), 
and let fi'^ := exp(/o + A(/i — /o))mo for A € [0, 1] be the segment joining ji^ 
and fi'i . Then the following hold. 

(1) The measures fi'^ are similar for A € (0, 1). 

(2) fi'^ ^ h'q and n'^ ^ fj.[ for A € (0, 1). 

(3) T^>^M+{n,fio)=T^,^M+{n,fio)+T^[M+{n,fio) for Xe {0,1). 
Proof Let Ai € (0, 1) and A2 G [0, 1]. Then 

/^Ai = exp(/o + Ai(/i - /o))^o 

= exp((Ai - A2)(/i - /o)) exp(/o + A2(/i - /o))/^o • 
^ , . ' 

But now, for p > 1 we have 

(/^MAa = exp(p(Ai - A2)(/i - /o)) exp(/o + A2(/i - fo))no 
= exp(/o + (pAi + (1 - p)A2)(/i - /o))Ato 

= A^pAi+(l~p)A2- 

Since Ai e (0, 1), it fohows that pXi + (1 - p)A2 G (0, 1) for p - 1 > suffi- 
ciently small, so that IJ'p)^^^(^i_p-^x^ € M.+ {fl, fio) and hence, (p^ G L^{Q,fi'^^) 
01 (j) € LP{Q, 11x2) for small p— 1 > 0. Therefore, fi'^ ^ fi'^ for all Ai G (0, 1) 
and A2 G [0, 1], which implies the first and second statement. 

This implies that T^>M+{^, fJ-o) C T^,_^M+{n, hq) = T^/ ^M+{n,fio) for 
i = 0,1 and all A G (0, 1) which shows one inclusion in the third statement. 

In order to complete the proof, let := O+Ori- where r2+ := {cj G 
^ \ {fi- /o)(w) > 0} and n_ := {u e n \ (/i - /o)(oj) < 0}. Now let 
g G T^i^^^A4+{il., hq) so that tg G B^i^^^[Q.) for some t ^ 0. Then 

exp(|t5(Xc+l)'^Mo = / exp(|t5f|)(i^o + / <^l^'o 
Jn+ JQ- 

< I exp{\tg\ + ^{fi- fo))dfi'o+ [ dfi'o 
Jn+ ^ Jn 

= / exp{\tg\)dii[,2 + / dnQ< 00, 
Jn+ Jn 

so that gxn+ G T^jj(f], /io). Analogously, one shows that gxn^ G T^j/ (il,^o) 
and hence, g = gxn+ + gxn. G T^/^ (Q, ^0) + ^o)- □ 

4. Sufficient statistics and the Amari-Chentsov structure 

A statistic k is a measurable map between a measure space and 
a measurable space i}2- The notion of a sufficient statistic (Definition 14.11 
Lemma l4.3p has been introduced by Fisher in 1922. It plays an important 
role in mathematical statistics as a whole and, in particular, in estimation 
theory. We give a simple proof that the Amari-Chentsov structure is invari- 
ant under sufficient statistics (Theorem 14. Sh . We also give a geometric proof 
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of the Fisher-Neyman factorization theorem which characterizes a sufficient 
statistic K : {Q,i,fii) — > Q2 under the assumption that k : Jli — > is a 
smooth map (Theorem I4.10p . Using Theorem 14.101 we present a proof of 
the monotonicity theorem, also cahed the Cramer-Rao inequahty (Theorem 
14. lip . At the end of the section we consider examples of sufficient statistics, 
which are associated with Markov congruent embeddings from A4^{0,n, fJ-n) 
to M+{n mi/^m) (Example [4T4]) . Using them we discuss Chentsov's results 
on geometric structures which are invariant under sufficient statistics be- 
tween finite sample spaces (Proposition 14.171 Lemma l4.18p . 

For a measurable map k : (r2i,//i) — )■ ^2 let us denote by k*(/xi) the 
push- forward measure on 0,2- 

Definition 4.1. (cf. [U (2.17)], P Theorem 1, p. 117]) Assume that 
(Af, Oi, /ii,pi) is a fc-integrable parametrized measure model and ^2 is a 
measurable space. A statistic k : — > O2 is said to be sufficient for 
the parameter x £ M if there exist a function s : M xi}2 and a function 
t G L^{Qi,^i) such that for all x € M we have s{x,lo2) G L^{^2, i'i*{fJ'i)) 
and 

(4.1) pi{x,u}i) = s{x, K{oji))t{uJi) fii — a.e. . 



Remark 4.2. Definition 14.11 is a version of the Fisher-Neyman characteri- 
zation theorem, which states that a statistic is sufficient for the parameter 
rr € Af if and only if (j4.ip holds. The Fisher-Neyman characterization the- 
orem is simpler to formulate than the corresponding definition in textbooks 
on mathematical statistics, e.g. in [8l Definition 1, p. 116], which involves 
the notion of conditional distribution. 

A measurable map k : (J7i,/xi) — )■ Q2 transforms a parametrized measure 
model (M, Oi, //i,pi) into the parametrized measure model (Mi, ^2) «^*(/^i)j 
whose density potential K*(pi) is defined by 



(4.2) n,{pi 



dK*(/xi) 



Lemma 4.3. A statistic k : {Qi,fxi) — )■ ^2 is sufficient for the parameter 
X £ M if and only if the function 



r{x,uJi) 



Pl{x,UJl) 



h^{Pi){x,k{uji)) 
does not depend on x for almost all loi € (r2i,/ii). 

Proof. The "if part of Lemma 14.31 is obvious. Now we assume that ()4.ip 
holds, i.e. pi(x, wi) = s{x, K{uJi))-t{uJi) for all x € M and almost everywhere 
on Then for all x G Af and almost all ui G we have 

(4.3) ft;*(pi)(x, k{uji)) = K*(t)(K(a;i)) • s{x, k{uji)). 
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From ()4.3p we obtain for all x G M 

t{uji)s{x,ti{uji)) t{u}i) 

(4.4) r[x,uji) = — — ; ; — = — -— — — - fii — a.e. . 

K^[t){K.[UJi)j ■ S[X,K[UJl)) K*(t)K(a;i) 

This completes the proof of Lemma 14.31 □ 

We get immediately 

Corollary 4.4. Assume that k : Qi ^ 0,2 is a sufficient statistic for the 
parameter x £ AI where (M, Oi, pi) is a k-integrahle parametrized mea- 
sure model. Then (M, '^*(/^i)i «^*(pi)) is also a k-integrable parametrized 
measure model. 

Let K : {Qi, Hi) — )• {^^2, fJ'2) be a statistic and (M, ^li, ni,pi) a /c-integrable 
parametrized measure model. The Fisher quadratic form on the trans- 
formed parametrized measure model (M, r22, '^♦(a*!), «^*(pi)) is defined by 

(4.5) g^{V,Vy, = [ {dvH^^*{pi){x,uj2))fdKMx)). 

Theorem 4.5. If a statistic k is sufficient for the parameter x G M, then the 
Amari-Chentsov structure transformed by k is equal to the original structure. 

Proof. Assume that a statistic k is sufficient for the parameter x E M. By 
Lemma 14.31 we have for all x € M 



(4.6) pi{x,u)i) = r{uji)K^{pi{x)){K{uJi)) fii - a.e. . 
Hence for all x £ M and all V G T^M 

(4.7) 9y lnpi(x,a;i) = 5\/lnK*(pi(a;))(/t(a;i)) /xi — a.e. 
It follows for all x G M and all V £ T^M 



g {V,V)x = / {dvinK^{pi{x)){K{uj2))) r(wi)K*(pi(x))(K(a;i))d/ii 
= f{V,V).. 

This proves the invariance of the Fisher metric under sufficient statistics. 
The invariance of the Amari-Chentsov tensor under sufficient statistics is 
proved in the same way. □ 

Corollary 4.6. Assume that Q is a differentiahle manifold provided with 
the Borel a -algebra. The Amari-Chentsov structure on any k-integrable 
parametrized measure model {M,Q, fj,,p) is invariant under the action of 
the diffeomorphism group of . 

Remark 4.7. The first known variant of Theorem 14.51 is the second part of 
the Cramer-Rao inequality (Theorem I4.1ip [llj , |29j . The invariance of the 
Amari-Chentsov structure on statistical models associated with finite sample 
spaces under sufficient statistics has been discovered first by Chentsov [Hj . 
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In what follows we interpret the function r(x,wi) assuming that Qi and 
^2 are smooth manifolds supplied with the Borel a-algebra and k is smooth. 
Furthermore, we assume that is dominated by a Lebesgue measure on 
0,1, i.e. a measure that is locally equivalent to the Lebesgue measure on 
M". Then the set Q^""^ of singular values of k is a null set in (r22; '^*(mi))- 
Let u!2 be a regular value of k. Then K~^{i02) is a smooth submanifold of 
ill. Furthermore, any sufficiently small open neighborhood [/e(w2) C 
of UJ2 consists only of regular values of k. Without loss of generality we 
assume that the preimage {Us{uj2)) is a direct product Ue{L02) x K~^{i02), 
which is the case if Us{oo2) is diffeomorphic to a ball. The measure /ii 
(respectively, pi{x)) on the source space and the induced measure K*(/ii) 
(respectively, K*{pi{x))) on the target space define a "vertical" measure 
which depends on /xi, on each fiber k~^{uj2) by the following formula: 



for all y € k~^{uj2)- (Respectively, we replace /ii by pi{x) in the LHS and 
RHS of (14. 8p ). Here we identify a point {002, y € k~^{liJ2)) with the image of y 
in Vti via the inclusion f~^{oJ2) — ^ ^i- Note that diJ.^^{^i,y) is well-defined 
only if (jj2 G /{(ili). 

Lemma 4.8. Assume that the value uj2 of a statistic k is regular. Then 
/i^2(A*i) 0' probability measure on k~^{uj2) for any finite measure on 

Proof. We need to show that 



(4.9) / d//4(/ii,2/) = L 

J re-i(a;2) 

Let 5 be a Riemannian metric on $72- Denote by Df,{(jj2) the disk with 
center at 002 and of radius e. Using (j4.8p and Fubini's formula we obtain 



(4.10) / dK*(/ii) / dfj,^^{fii,y) = dfii. 
Taking into account 

(4.11) / d/ii = / dK*(/ii), 
we derive from (j4.10p 

(4.12) / (^i,y) = hm^^^^ =1. 

This proves ()4.9p and Lemma 14.81 □ 

Remark 4.9. The measure /i^^ is the conditional distribution [duji\uj2) of 
the variable (elementary event) wi subject to the condition k = 002- In 
general, a conditional distribution {djjJi\oj2) of the variable ui subject to 
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condition k = uj2 can be defined for measurable mappings, wliich need not 
be smootli. We refer to [TBI P- 81], [U p. 106] for a definition of a conditional 
distribution in a general case. 

Theorem 4.10. Assume that Vti and 0,2 are smooth manifolds supplied 
with Borel a-algebras and fii is a measure on Qi dominated by a Lebesgue 
measure. Let (M, fii, be a k-integrable parametrized measure model. 
A smooth statistic k : (r2i,/ii) is sufficient for the parameter x G M 

if and only if the conditional distribution IJ-^^ipiix)) defined on the set of 
regular values UJ2 of k, is independent of x € M. 

Proof. Representing a point uji by the pair {K{uji),y), y € k~^{k{uji)), we 
write 

(4.13) dn^^^^^{pi{x),y) = 

where 

Observe that (|4.13|) is equivalent to the following 

(4.15) pi{x,{K{uji),y)) = fl-j^^^^-^{x,y)K^{pi){x,K{uJi)). 

(|4.15p implies that i^^i^^^-^i^^u) coincides with r{x.,uji). Now we obtain The- 
orem STTO] from Lemma 14.31 immediately. □ 

Using Lemma 14.31 and Theorem 14.101 we will present a proof of the mono- 
tonicity theorem (Theorem 14. lip , also called the Cramer- Rao inequality, 
which characterizes sufficient statistics in terms of the Fisher information 
metric. 

Theorem 4.11. (Cramer-Rao inequality, cf. [H Theorem 2.1]). Assume 
that ill o-nd ^2 o,fG smooth manifolds provided with Borel a-algebra and jJLi 
is a measure on ili dominated by a Lebesgue measure. Let (M, r^i, /Ui,pi) 
be a k-integrable parametrized measure model and k : Vti ^ Q.2 a statistic. 
Denote by g^ the Fisher metric on the transformed parametrized measure 
model (M, f22j k*(mi)) '^*(pi))- For each x G M and each V G T-^M we have 

(4.16) ~g^{V,V)^<g^{V,VU 

Inequality Ii4-16 ) becomes an equality for all x G M and for all V G TxM if 



and only if the statistic k is sufficient for the parameter x G M. 

Proof. Since the space of smooth maps is dense in the space of measurable 
maps with respect to the topology, for any p > 1, and taking into account 
dvi^*{p) = Ki,(dvp), for a proof of Theorem 14.111 we can assume that k is 
smooth. Denote by 0,2^^ the set of regular values of k. Using ()4.8p . we 
obtain 
(4.17) 



g (y,V):,;= / ^ dK^{pi{x)) / {dvlnpi{x,y)) d^i^^^^^{pi{x),y). 
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Recall that 

(4.18) f{V,V)^ = [ {dv\nK,{Pi){x,L02)fdKMx)). 

To prove Theorem 14.111 comparing (|4.17p with (j4.18p , it suffices to show 
that for each x G M and for each UJ2 € i^2^^ the following inequality holds 

(4.19) / (5y lnpi(x,y))V4(Pi(a;),y) > (<9y ln/^*(^l)(x,^J2))^ 

and the equality holds for all a; € M and all regular values uj2 if and only if 
K is sufficient for the parameter x € M. 

Taking into account (|4.15p and Lemma 14.81 we note that ()4.19p is equiv- 
alent to the following inequality 

{dvlnK^{pi){x,u}2) + dvlnfl^^{x,y)f d^j,^^{pi{x),y) > 

(4.20) [ {dvlnK,ipi)ix,uj2))''dti^^{pi{x),y). 
Lemma 4.12. For all x ^ M we have 

(4.21) / dv\nfij:^^^){x,y)d^ii^,^^{pi{x),y) = Q. 

J{y£K.{uJi)} 

Proof. Writing l^i;^^^){piix)) = fi-t(^^^){x, y)n^^^^-^{ni), we observe that (|4.2ip 
is a consequence of the following identity for all x G M: 

{yeK{LUi)} 

whose validity follows from Lemma l4.8i □ 

Clearly (j4.20p follows from Lemma 14.121 since dy lnK^{pi){x, U2) does not 
depend on y. Note that (j4.20p . and hence (j4.19p . becomes an equality if and 
only if A*^(ti;i)(Pi('^)) independent of x. By Theorem l4.10l the last condition 
is equivalent to the sufficiency of the statistic k for the parameter x € M. 
This proves Theorem 14.111 □ 

Remark 4.13. 1. Assume that a statistic k is smooth. Denote by g^^ the 
Fisher quadratic form on the statistical model fi^^{pi{x) , y) with respect to 
the reference measure /Uk(cji)('"i> 2/) (|4.14p . Taking into account (j4.17p . 
(j4.18p and (j4.2ip we obtain immediately the following equality for all x G M 
and all V €T^M (cf. d Theorem 2.1]) 

(4.22) gF{V,V)=g^iV,V)+ [ g^^{V,V)dKMx))- 

The integral in the RHS of (j4.22p is called the information loss [21 p. 30]. 
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Example 4.14. Let J7„ be a finite set of n elements Ei, - ■ ■ ,En- Let fin 
be the probability distribution on Qn such that iJ,n{Ei) = 1/n for i G [1,^]. 
Clearly, the space V{i^n, A*") consists of all probability distributions p on 0„ 
which can be represented as 

(4.23) piE,) = f{Ei)finJorie[l,n] 

for some non-negative function / : 0^ — )• M such that fi^i) — ^• 

Denote by E* the Dirac measure on fin concentrated at Ei. The space 
M+{Qn, IJ-n) of measures equivalent to fin consists of all measures p = 
Y17=iPi^i ^Pi ^ ^° positive cone M". Let n < m < oo. Let 

{Fi, ■ ■ ■ , Fn} be a partition of the set := {-^ii • • • > -Pm} into disjoint sub- 
sets. Denote this partition by R. We associate R with a map «; : 
by setting 

:= Ei X € Fi. 

We identify 7W+(0 

mj A^m) with M™ which is generated by the Dirac measures 
F*,j G [l,m]. Recall that a linear mapping H : M" ^ M™, n(F^) := 
Yl^=i ^kjF* , is called a Markov mapping, if Lljj > and X^^i n/cj = 1 (cf. 
Example I5.6p . Following Chentsov [141 p. 56 and Lemma 9.5, p. 136], we 
call n a Markov congruent embedding subjected to a partition R if 

. F,^K~\E,) =^ U{E*){F,) = 0, 
• n{E*) / for all i G [l,n]. 

Note that U^M^i^n, fJ-n)) C M^i^m, IJ-m)- The restriction of 11 to ]R>o = 
M.{fln, fJ-n) is also denoted by 11. 

Proposition 4.15. Let U : A4(r2„,/in) — )• f^n) be the restriction of 

a Markov mapping such that the image (n(A^_|_(r2„, Hn)), ^n, IJ'n, i) is an im- 
mersed statistical model of dimension n, where i is the canonical embedding. 
A statistic k, : — )• ^n is sufficient for the parameter x G n(A^_|_(Q„, /x^)), 
if li is a Markov congruent embedding subjected to k. 

Proof. Assume that 11 is a Markov congruent embedding subjected to a 
statistic K : Vim ^n- Then o n = Id. By the monotonicity of the 
Markov morphism, see Corollary 15.111 below, k must be sufficient for the 
parameter x G n(A^_l_(r2„, □ 

Since oil = Id for Markov congruent embeddings 11, using Theorem 
14.51 we obtain immediately 

Corollary 4.16. Let IT : Jv[(yin, fin) -^(^^mi/^m) be a Markov congruent 
embedding. Then the Amari- Chentsov structure on A4+(r2„,;U„) coincides 
with the Amari- Chentsov structure on (A^+(0„, /in), r2m> /^mi n). 

A variant of Proposition 14.151 has been proved by Chentsov [14^ Lemma 
6.1, p. 77 and Lemma 9.5, p. 136], see also Proposition 15.71 below. It plays 
a decisive role in the Chentsov's work [14] on geometric structures on sta- 
tistical models (M, O^, /x„,p) that are invariant under sufficient statistics. 
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It implies that such geometric structures are preserved under Markov con- 
gruent embeddings, which are easier to understand. (Chentsov's arguments 
for finding invariant geometric structures have been re-exposed by Camp- 
bell in [9].) Since is canonically isomorphic to M"q, any map 
p : M ^ A4{Qn, IJ-n) can be written as p := (pi, • • • ,Pn)- We resume some 
results in |14j . which are important for our paper, in the following 

Proposition 4.17. (1) (cf. [21 Lemma 11.1 p. 157] j Assume that C is a 
continuous function on statistical models (M, 0„, /U„,p) associated with finite 
sample spaces {il-n} such that C is invariant under sufficient statistics. Then 
C is a constant. 

(2) (cf. \14:\ Lemma 11.2, p. 158]) Assume that A is a continuous 1-form 
field on parametrized measure models [M^Vtn^ l^n^P = {pir ' ' ^Pn)) associ- 
ated with finite sample spaces {0,n} such that A is invariant under sufficient 
statistics. Then there is a continuous function c : M — )• M such that for all 
x^M and all V G T,M C A,{V) = c(X:r=iK(^)) lTi=iPidv\^Pi{x). 
In particular, there is no continuous 1-form field on statistical models asso- 
ciated with finite sample spaces that is invariant under sufficient statistics. 

(3) (cf. [141 Theorem 11.1, p. 159]j Assume that F is a continuous 
quadratic form field on parametrized measure models {M,^}n, fJ-njP) associ- 
ated with finite sample spaces such that F is invariant under suffi- 
cient statistics. Then there is a continuous function f : M M such that 
F = f {Y17=i Pii^)) ■ 5^ + where A is the 1-form field described in (2) 
and is the Fisher metric. In particular, the Fisher metric is the unique 
up to a constant quadratic form on statistical models that is invariant under 
sufficient statistics. 

(4) (cf. \14:\ Theorem 12.2, p.l75]j Assume that T is a continuous covari- 
ant symmetric 3-tensor field on parametrized measure models {M^Q.^., ^n,p) 
associated with finite sample spaces such that F is invariant under 
sufficient statistics. Then there is a continuous function t : R — > i? such 
that T = t{Y]l=iPi{x)) ■ T^^ + ■ A2 + AI where and T^^ are the 
Fisher metric and the Amari-Chentsov tensor respectively, and ^1,^2 o,re 
the fields described in (2). 

The argument of Chentsov for proving (2) rests on the permutation in- 
variance, because a map from 0^ to itself that permutes the points of Vl 
is clearly a sufficient statistic. And from this permutation invariance, one 
easily obtains that A has to be of the form given in (2), that is, constant, 
and that this constant has to vanish in the statistical case. (3) and (4) can 
then be deduced from some general arguments about tensors. Note that in 
|14j Chentsov only gave a proof of Proposition 14.171 for statistical models 
(M, r2„, The extension of Proposition 14.171 to parametrized measure 
models associated with Vl^ can be obtained easily using the following lemma. 

Lemma 4.18. Assume {M,i}, fj,,p) is a parametrized measure model and 
n : 0, ^ W is sufficient for the parameter x € {M,Q, fi,p). Then k is also 
sufficient for the parameter x £ (M x (0, fj,,p{x,t) := tp{x)). 
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Proof. Since K^{t^) = for any finite measure /i on Q and t € M+, we 

get 

d{tp{x)) _ dp{x) 

Taking into account Lemma 14.31 this proves Lemma 14.181 □ 

Using Lemma 14.181 we obtain the second assertion (2) of Proposition 
14.171 from its particular case for statistical models and the first assertion 
(1), since each 1-form A S A^+(r2„, is a sum of two linear inde- 
pendent 1-forms ^0 ^-nd A-^, where Aq annihilates the tangent hyperplane 
rpA^^^+-+P"(0„,/i„) c TpM+{Qn,fin), and A^ = A-Ao. 

Using the same argument we obtain the third assertion of Proposition 
14. 171 from its particular case for statistical models and the second assertion. 

The last assertion of Proposition 14.171 is obtained from its particular case 
for statistical models, the second and third assertion. 

We also note that in [9] Campbell gave a detailed proof of the third 
assertion of Proposition 14.171 using Chentsov's argument in |14j . 

5. Markov morphisms and sufficient statistics 

In this section we introduce the notions of a Markov morphism, a fi- 
representable Markov morphism, and a restricted Markov morphism (Defi- 
nitions I5.lt 15.21 15. 4p extending the Chentsov notion of a Markov morphism 
|12j . and the notion of a statistical morphism introduced independently by 
Morse and Sacksteder in [24]. These notions are needed for comparing two 
statistical models; they stem from the Blackwell concept of "comparison of 
experiments" in [7]. A novel aspect is our consideration of a parametriza- 
tion of the parameter space M of a parametrized measure model (M, fi, n,p) 
as a restricted Markov morphism (Definition 15. 4| Example 15. 5p . Thus, the 
geometry of parametrized measure models is intrinsic (Example 15. 5p . We 
decompose a Markov morphism associated with a (positive) Markov transi- 
tition kernel as a composition of a right inverse of a sufficient statistic and 
a statistic (Theorem I5.10p . As a consequence we give a geometric proof of 
the monotonicity theory for Markov morphisms (Corollarv I5.1ip . 

Positivity assumption. In this section, for the simplicity of the expo- 
sition of the theory, we enlarge the class of parametrized measure models 
to include also {M,^}, fi,p), where p : M Ai{0,,fj,). This assumption is 
caused solely by the fact that if the Markov transition kernel n(a;,a;') is not 
everywhere positive, then the transformed density ^(x, w') := J^Il{uj,uj')dfi 
need not be everywhere positive on (Q' Alternatively, when consid- 
ering Markov transition kernels we restrict ourselves to positive ones. 

Definition 5.1. ([I2l p. 194], [24l P- 205]) A Markov transition from a 
measurable space (f2, S) to a measurable space (17', S') is a map T : Q — >■ 
V{n', S') such that for each 5 e S' the function Jg d{T{x)) is a S-measurable 
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function. A Markov transition T : — ?> 'P{^l',T,') defines a Markov mor- 
phism n ■■ M{n, S) ^ M{n\ S') by 

(5.1) T,{i^){S) ■■=11 d{T{uj))dv 

Jn Js 

for 5 e S'. 

Since T(il) C Vi^l.', S'), substituting S := n' in ([5T1) . we obtain 

r,(x°(j7,s)) c M^in',!:') 

for all a € M+. 

Next, we assume that r(w) is dominated by a probability measure fi' S 
P(ri',S'). Then there exists a measurable function : ri' ^> M such that 
for all S € S' we have 

(5.2) T{u){S) = [ n^(w')V- 

Js 

If T(il) C 'P{il',n'), by (j5.2p . there exists a Markov transition kernel II : 
nxn' ^M. from O to M{n' , such that 

(5.3) T{uj){S) = [ U{uj,uj')dfi'. 

Js 

Definition 5.2. If ()5.3p holds, r(n) := T is called a -representable 
Markov transition, and T(n)^, is called a fi' -representable Markov morphism. 

Note that any Markov transition kernel 11 : x f]' — > M from il. to V{Vt\ n') 
satisfies 

(5.4) U{uj,uj') > for all (w,w') e n x n' , 

(5.5) / Uiuj,uj')dfi' = 1 for all u £n. 
Jn' 

Abbreviate T(n)* as 11*. For any measure v G A4{Q) and 5 G S' we have 

(5.6) U^{u){S)= [ [ U{uj,uj')dn' du. 

Jn Js 

It follows 

(5.7) ^^('^') = jj^i^^^')d^- 

If 0, ri' are finite sets, then any Markov morphism T : A^(r2,S) — > 
A^(r2',$]') is /i-representable for any dominant measure on O', see also 
Example 15.61 This is not true, if $7' are open domains in M", n > 1. 

Example 5.3. 1. (cf. [121 P- 511]) Let (ri,S) be a measurable space. We 
define a Markov transition T^'^ on (O, S) by setting 

r"(a;)(A) := xa{^) for w G 

where XA is the indicator function of yl G S. Clearly T/'^ defines a Markov 
morphism which is the identity transformation of V{Vl,Ti). Note that T/*^ 
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is not a /i-representable Markov morphism for any measure fi € A4{Q, S), if 
is an open domain in R" with Borel fj-algebra S, and n > 1. To see this, 

we note that if jj, dominates all the measures T^'^{uj),ui E ft, then fj, has no 

null set, in particular fi{{uj}) > for all a; € 0. It is easy to see that this is 

impossible, since dimri > 1. 

2. Assume that k : — )• is a statistic. Then k defines a Markov 

transition T'^ from (r2i,Ili) to (02,S2) by setting 

(5.8) := Xa(«:(^i)) for ui G 

and A eT,2. For z/ G and 5 G 1:2, using (j5T]) . we get 

T:{i^){S) = [ [ dxA{t^{^i))dv = [ du. 

Hence = k*. Then is not a /i2-i'epresentable Markov morphism for 
any /i2 G A4{Q2), if for instance K{il.i) and are open domains in R", 
n > 1, since there exists v G M{Qi) such that k*(z^) is not dominated by 

^2- 

Denote by C^(Mi,M2) the space of all differentiable maps from a dif- 
ferentiable manifold Mi to a differentiable manifold M2. Let and 
be measurable spaces. Denote by 9Jt(r2i,r22) the set of all Markov 
morphisms from A4{il.i) to 7W(il2)- 

Definition 5.4. Assume that (Mi, J^i, and (M2, f^2) 1^-2, P2) are parametrized 

measure models. A pair (/ G C^(Mi,M2),T G 5[R(ili, 1^2)) is called a re- 
stricted Markov morphism, if for all x G M 

(5.9) P2{f{x))=n{pi{x)). 

Example 5.5. 1. Assume that {M,Qi, fii,pi) is a parametrized measure 
model and k : f^i — >■ O2 is a statistic. Then (M, f72, k*(^i), K*(pi)) is a 
parametrized measure model. By Example l5.31 2 the pair (Id, Markov 
morphism. We also call {Id, k^) a statistic if no misunderstanding occurs. 

2. Assume that (M2, ^2) /^2iP2) is a parametrized measure model and / : 
Ml M2 is a smooth map. Then (Mi, ^2, fJ'2,Pi ■= P2°f) is a parametrized 
measure model and the pair (/, Id) is a Markov morphism. Such a Markov 
morphism is called generated by a smooth map f. It is easy to see that, if / 
is a diffeomorphism, then the Amari-Chentsov structure on Mi is obtained 
from the Amari-Chentsov structure on M2 via the pull-back map /*. 

Example 5.6. Let (0^, fin) and (Om, A^m) be the measure spaces in Example 
14.141 Let n : 0^ x 0^ — )• M be a mapping such that Iljj := Il{Ei, Fj) satisfies 
the following conditions 

Llij- > for all 1 < i < n, 1 < J < m, 

m 

(5.10) ^Hij = 1 for ah 1 < i < n. 
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Clearly, 11 is a Markov transition kernel from r2„ to M{^}rn, f^m)- By (|5.6p 
n induces a map 

n 

(5.11) n.(ED(F,) := J2^,,,El{E^) = Ukj. 

1=1 

Hence 

m 

(5.12) n,(E^) = J^Hfc.i^*. 

i=i 

Let 

(Ml := P+(ri„,,/i„),17„,/i„,,pi(x) :=x), 

(M2 := P+(ilrn.,/"m),f^m,/^m.,Pl(y) := y) 

be statistical models. By ([521), a pair (/ € DifF(Mi, M2), H G ajl(0n,^^m)) 
is a Markov morphism, if and only if for all x G Mi 

(5.13) f{x){Fj) = Ii*{x){Fj) for all 1 < j < m. 

Thus for n G 9K(r2„, Q^) the pair (/, H) is a Markov morphism if and only 
if / = H^Ia/^. We also abbreviate (n*|Mi,n) as H if no misunderstanding 
occurs. 

Next we drop the assumption that n <m. Note that there is a canonical 
map 

Let K : r2„, ^ Q.m be a statistic. The composition Xm°K- '■ ■M.{0,m, IJ-m) 

defines the following map 

(5.14) I^''{E^,Fj) := {xmOK{E,),Fj) 

Clearly XljLi ^'^(-E^j, -Fj) = 1 for all i. Hence YV^ is a Markov transition 
kernel. Note that H^ : A^(r2„,^.„) A4{Clm, fJ'm) coincides with the push- 
forward map : ^A{Qn, tJ-n) M{0,m, tJ"m)- 

Proposition 5.7. A linear mapping H : M" — >■ M™' is a Markov congruent 
embedding subjected to a statistic k, if and only if o n(a;) = x for all 
X G M>Q. A Markov mapping H : — )■ M"* /las a /e/i inverse if and only if 
it is a Markov congruent embedding. 

The first assertion of Proposition 15.71 is obvious. The second assertion of 
Proposition 15 . 71 is a reformulation of Lemma 6.1, p. 77 and Lemma 9.5, 
p.136]. 

Let {M,Qi, fj,i,pi) be a parametrized measure model, ($725/^2) a measure 
space and H : x ^2 — 1^ a Markov transition kernel from f^i to V{^2-, l^2)- 
Set for each x G M 

(5.15) Ii^^\x,UJl,UJ2) ■.= Ii{uJi,uJ2)p{x,0Ji) 
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By ()5.5p we get 

(5.16) / U^P\x,uji,uj2)niH2 = pi{x,u}i)d^i. 

Lemma 5.8. Then (M, f^i x $72, /xi;U25 n'^^ (x, a;i, 6^2)) is a (generalized) 
statistical model. Moreover, the Amari-Chentsov structure on (A/, f^i x 
f^27 /^i/U2) nl^l) coincides with the Amari-Chentsov structure on {M,i}i, fii,p). 

Proof. The first assertion of Lemma [5 . 8 1 follows from Lemma [4.31 and Corol- 
lary ESI 

We present two proofs of the second assertion of Lemma 15.81 
First proof. Let us compute the Fisher quadratic form on (M, ^1X^2, 
using (I5.15|) and (15. 5p 



g^{V,W), = [ {dvlnU^\x,uJi,U2)){dwlnn^Kx,uJi,uj2))n^P\x,uji,uj2)df,2dfii 
JQ1XQ2 

{dv lnp{x,u!i)){dw \np{x, u}i))p{x , u}i)U{uji, uj2)dfi2dfii 



(5.17) = / {dvlnpix,coi)){dw^'npix,uJi))p{x,uJi)dfii. 

In the same way, taking into account 

dv lnnt^](2;,i:;;i,i:;;2) = dy lnp{x,uji), 
/ U{u}i,U}2)dfl2 = I, 

we conclude that the Amari-Chentsov tensor T^'-^ on (M, 0,1X^2, /t^i/^2, n^^^) 
coincides with the Amari-Chentsov tensor on (M, ill, fii,p). This completes 
the proof of Lemma 15.81 

Second proof. Comparing ()4.ip with (j5.15p we observe that tti : il.ixil.2 
Qi is a sufficient statistic with respect to the parameter x € (M, Qi x 
^^2,A*i/^2)nIP](x,a;i,W2))- Thus Lemma [5.81 is a consequence of Theorem 
KTT\ 2. □ 

Combining the second proof of Lemma 15.81 and Example 15.51 we obtain 

Corollary 5.9. Let {M,il.i,dfii,p) be a parametrized measure model. The 
projection tti : Qi x Q2 ^ ^1 is a sufficient statistic for the parametrized 
measure model (M, i7i x r22, /^i/i2, n'^l). 

Theorem 5.10. Let (/(i, 11*) : (Mi, /Ui,pi) (Mi, $^2, ^2,^2) be a 
restricted Markov morphism, where H^, is fi2-i"^pi"ssentable by a positive 
Markov kernel. Then (Id, 11*) is a composition of a right inverse of a suffi- 
cient statistic and a statistic. 

Proof. Let us denote by 7r2 : f^i x f]2 ^ ^^2 the projection onto the second 
factor. We observe that (Id, IT*) is a composition of two maps (Id, 111^12) : 
(Mi,Qi,//i,pi) {Mi,^}ixQ2,Pi2{x,(jJi,(jJ2) := p(x,wi) -11(^1, W2)) and the 
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push forward map {tt2)*- The map (Id, 111^12) is the inverse of the sufficient 
statistic {Id, (vri)*) by Coronary 15. 91 This completes the proof of Theorem 
KM □ 

Let Ml = M2. A restricted Markov morphism of form [f^T^,) is called 
representable if / is a diffeomorphism, and T^, is ^-representable. 

Corollary 5.11. (cf. 01 p. 31]) Representable restricted Markov morphisms 
decrease the Fisher metric on statistical models. 

2. The Fisher metric is the unique up to a constant weakly continu- 
ous quadratic 2-form field on statistical models associated with finite sample 
spaces {^n} that is monotone under representable restricted Markov mor- 
phisms. 

Proof. The first assertion of Corollary 15.111 is an immediate consequence of 
Theorem 15.101 and Theorem 14. 111 2. 

The second assertion of Corollary 15.111 is a consequence of Theorem 15.101 
and Proposition 14.171 taking into account the following fact. A congruent 
Markov embedding 11 : V{Qn,fJ'n) Vi^t, 

rm fJ'm) Subjected to a statistic k 
satisfies njn(x) = Id by Proposition 15. 7[ Since any quadratic form field 
on {A4+{Qn, tJ-n), lJ-n,p{x) := x) that is monotone under Markov morphisms 
is monotone under Markov congruent embeddings, it follows that such a 
quadratic form is invariant under sufficient statistics and also invariant 
under Markov congruent embeddings. Chentsov's result implies that such a 
quadratic form is the Fisher metric up to a constant. □ 

The second assertion of Corollary 15.111 is also valid for statistical models 
associated with infinite sample spaces. A proof of this assertion will be given 
in a forthcoming paper. 

6. Proof of the Main Theorem 

Our proof of the Main Theorem (Theorem 12. 9p is based on the following 
main observation. For each step function r on (0, /x) subject to a statistic 
K : (0, /i) — >■ i^n ■= {El, ■ ■ ■ ) En} (Definition l6.ip there exists a parametrized 
measure model {M,Q, fi,p) and a vector V € T^M such that p{x) = fj, and 
9ylnp = T, moreover, k is sufficient with respect to the parameter x £ M 
(Lemma 16. 2p . Thus, the computation of any pointwise continuous covari- 
ant fc-tensor field on whose induced A;-tensor field on parametrized 

measure models is invariant under sufficient statistics, is reduced to the case 

= which has been considered by Chentsov for k = 1, 2, 3. 

Definition 6.1. (cf. Example 14. 14p Let (fi, /i) be a finite measure space and 
let K be a decomposition 0, = Diij . . . U-D„ where Di is measurable. Denote 
by K the associated statistic iln, >^{Di) '■= Ei. A function r : ^> M is 
called a step function subject to k, if t{u;) = Ti ■ XDii^^), where Tj G M and 
XDi is the characteristic function of Di. 
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Lemma 6.2. Let M = (0, 1) and Q be a smooth manifold. Given a finite 
measure ^ G a point xq G M , and a step function r := Yli TiXDi on Q, 

subject to a statistic k : {^,fJ,) ^n, there exist a k-integrable parametrized 
measure model {M,Q, fi,p) and V G T^qM such that 

(1) K is sufficient for the parameter in M , 

(2) p(xo) = ^i, 

(3) dv\-ap = Yli^iXD,- 

Proof. Note that k is a sufficient statistic for a A;-integrable parametrized 
measure model {M,Q, fi,p) iff p is given as in Definition 14. H i.e. 

n 

lnp{x,uj) = lnp{x, k{uj)) + lnt{uj) = ^ Si{x)xD^i^) + lnt(a;) 

1=1 

for smootli functions Sj : M — ?> M and t G L^{Q,). For such lnp{x,U}) the 
conditions (2) and (3) are equivalent to the following 

• EiLi Si{xo)xD,{(^) + Int(w) = 0, 

Set t{u}) = 1. The existence of functions Sj(s) satisfying the listed conditions 
is obvious: it suffices to choose smooth Si such that Sj(xo) = and dvSi{x) = 
Ti. In fact, we can simply take V = dx and Sj(x) = {x — XQ)Ti. Finally, one 
verifies that the defined parametrized measure model is /c-integrable, since 
the Si are smooth. □ 

Proof of the Main Theorem. 1. Let A be a pointwise continuous 1-tensor 
field on M.{VL) satisfying the condition (1) in the Main Theorem. To prove 
the first assertion of the Main Theorem, it suffices to assume that ^ is a 
step function TiXDi (using again the identification between the tangent 
vector V and Sylnp) subject to a statistic «: : — >■ 0,^- By Lemma 16.21 
there exists a 1-generalized statistical model {A4,Q, fi,p) such that 

(1) p{x,uj) = e^^^^^^^i , where G C°°{M), hence k is sufficient for the 
parameter x £ M, 

(2) p{xo) = 

(3) dvlnp{x,uj) = liTiXD,- 

Set 




Then K^:{fj,) = diE^ , where E* is the Dirac measure concentrated at Ei. 
Since A is associated with a statistical field which is invariant under k^, we 
have 
(6.1) 

n n 

AfiiT) = (^K.{A')(5v(lnK*(p))) = A(^di,-,dn)i'^l^ - ■ ■ ^Tn) = c(^di)^diTi, 

i=l 1=1 
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where c is the function defined in Proposition 14.171 2. Note that 

d/i, 



n 



This proves the first assertion in the Main theorem. The next assertions of 
the Main theorem concerning specification of the covariant 1-tensor field A 
fohows immediately. 

2. Now assume that F is a pointwise continuous quadratic form on 
and is a finite measure. To prove the second assertion of the Main Theorem 
we follow the same line of arguments as above. It suffices to prove the 
validity of the second assertion for a step function r on Q, since F is a 
quadratic form (otherwise we have to consider step functions subjected to 
different statistics). We deduce the second assertion of the Main Theorem 
from Proposition 14.171 2 using the observation that the Fisher metric on 
M.{fl,n) applied to r 



„ n 



=1 

is equal to the Fisher metric applied to k*(t) = (ti, ■ 

n 

1=1 

3. The last assertion of the Main Theorem is proven in the same way. 
It follows from Proposition 14.171 2 using the observation that the Amari- 
Chentsov 3-symmetric tensor on A^(f],/_f) applied to r 

„ n 

Jn 

is equal to the Amari-Chentsov tensor applied to K*(r) = (n, • • • ,r„) 

n 
i=l 

To complete the proof of the Main Theorem we need to show that 

(1) all the tensor fields described in the Main Theorem are weakly con- 
tinuous on n-integrable parametrized measure models, 

(2) the tensor field A is invariant under sufficient statistics. 

Note that (1) holds since the associated /c-tensor fields r on A4{Q) are 
weakly bounded, i.e., for any /i G A4{Q) there is a continuous function 
c : M{9,,i^i) M. such that |r^(F)| < c{fi)\\V\\iu(^Q^^y 
The proof of (2) is similar to the proof of Theorem 14.51 observing that 

dvpix) = dv lnp{x)p{x)ii 
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for p{x) = p{x)fi (cf. Remark I2.4p . and hence omitted. □ 

Remark 6.3. Our proof of the Main Theorem is based on the fact that the 
step functions are dense in the value space L"'(Q, //). The same proof appHes 
if we replace the value space L"(r2, fi) by another function space depending 
on the finite measure fi where the step functions are dense. 
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